Chaotic system optimal tracking using data-based synchronous method with unknown dynamics and disturbances
Song Ruizhuo1 , Wei Qinglai2 ,
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

 

† Corresponding author. E-mail: qinglai.wei@ia.ac.cn

Abstract

We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the min-max optimization problem. The off-policy adaptive dynamic programming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton–Jacobi–Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.

PACS: 05.45.Gg
1. Introduction

Chaotic system has complex nonlinear dynamics and its response exhibits some specific characteristics such as sensitivity to the initial condition, broad Fourier transform spectra, and irregular identities of the motion in phase space.[14] An early development in the history of chaotic systems dates back to 1963 in which an American meteorologist Lorenz tried to bring forward system equations to simulate the weather changes. The Lorenz system, as the first chaotic model, revealed the complex and fundamental behaviors of the nonlinear dynamical systems.[5] And then the concept of the generalized Lorenz system was extended by Lü, and a class of generalized Lorenz-like systems was discussed.[6] Until now, many efficient approaches have been proposed for controlling chaotic systems, such as impulsive control method,[79] adaptive dynamic programming (ADP) method,[10] and neural adaptive control.[11] Most of these methodologies consider the chaotic system without disturbance.

In this paper, we provide a new method to design an optimal tracking controller for chaotic systems with unknown dynamics and disturbance based on ADP algorithms. ADP characterized by strong abilities of self-learning and adaptivity has received significantly increased attention and becomes an important brainlike intelligent optimal control method for nonlinear systems.[1215] ADP algorithms include value iteration (VI) and policy iteration (PI) according to the different iterative methods.[16,17] In Ref. [18], a complex-valued ADP algorithm was discussed, where for the first time the optimal control problem of complex-valued nonlinear systems was successfully solved by PI. In Ref. [19], based on neurocognitive psychology, a novel controller based on multiple actor-critic structures was developed for unknown systems and the proposed controller traded off fast actions based on stored behavior patterns with real-time exploration using current input–output data. In Ref. [20], an effective off-policy learning based integral reinforcement learning (IRL) algorithm was presented, which successfully solved the optimal control problem for completely unknown continuous-time systems with unknown disturbances. Note that the development of ADP promotes the development of adaptive control and neural networks control. For example, in Ref. [21], a novel adaptive nonlinear controller was designed to achieve stochastic synchronization of complex networks, which is less conservative and may be more widely used than the traditional adaptive linear controller. In Ref. [22], the synchronization control of memristor-based recurrent neural networks with impulsive perturbations or boundary perturbations was studied. Two kinds of controllers were designed so that the memristive neural networks with perturbations could converge to the equilibrium points, which evoke human’s memory patterns. In Ref. [23], pinning adaptive synchronization for uncertain complex dynamic networks with multi-link against network deterioration was proposed, new synchronization criteria for networks with multi-link were derived to ensure the synchronized states to be local or global stable with uncertainty and deterioration. It is worth noting that the ADP algorithms have been successfully applied to control chaotic systems. In Ref. [24], an optimal tracking control scheme was proposed for a class of discrete-time chaotic systems using the approximation-error-based ADP algorithm. In that work, the system was defined to be known and the disturbance was not considered. But it lays a foundation of ADP and chaotic system control.

It is known that optimal control has been extensively used in the effort to design controllers for nonlinear systems with disturbances.[25] In Ref. [26], significant insight into the design of control problems has been provided, after it was formulated as a min-max two-player zero-sum (ZS) game problem. The optimal control in such a scenario is equivalent to finding the Nash equilibrium of the corresponding two-player zero-sum game,[27] which results in solving the so-called Hamilton–Jacobi–Isaacs (HJI) equation. During the last few years, strong connections between ADP and the optimal control have prompted a major effort towards developing reinforcement learning algorithms to learn the solution to the HJI equation arising in the optimal regulation problem. In Refs. [28] and [29], the ZS differential games were discussed in the framework of ADP. In Ref. [30], multiperson ZS differential games for a class of uncertain nonlinear systems were studied. In Ref. [31], multiperson non-zero-sum differential games were presented by the off-policy IRL method. The previous research supplies us with a kind of new perspective for optimal tracking control of chaotic system with disturbances.

Based on the previous research works, this paper studies the chaotic system optimal tracking problem with unknown dynamics and disturbances. First an augmented system from the tracking error dynamics and the reference dynamics is constructed and a new cost function is introduced for the optimal tracking problem. The tracking control problem is then transformed to a min-max optimization problem. The PI method is introduced to obtain the iterative cost function using the system dynamics. The off-policy ADP algorithm is then developed to find the solution of the tracking HJI equation online using the measured data and without any knowledge about the system dynamics. The critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The neural network (NN) implementation is given with convergence analyses. At last, two examples are given, and the effectiveness of the proposed synchronous solution method for optimal tracking control problem of chaotic systems is shown by the simulation results.

The rest of this paper is organized as follows. In Section 2, we present the motivations and preliminaries of the discussed problem. In Section 3, the synchronous solution is developed. In Section 4, the NN implementation is given with convergence analyses. In Section 5, two examples are given to demonstrate the effectiveness of the proposed scheme. In Section 6, the conclusion is drawn.

2. Problem formulation

Let us consider the chaotic system with disturbance described by

(1)
where is the chaotic system state, is the control, is the disturbance, and are unknown system dynamics with , and is the unknown disturbance gain. Actually, many nonlinear chaotic dynamical systems can be expressed as Eq. (1), such as Lü system,[32] Chen system,[33,34] Lorenz system,[35,36] several variants of Chua’s circuits,[37,38] and Duffing oscillator.[39,40]

Let be the constant reference trajectory and we have

(2)

The focus of this paper is to find an optimal control u(x). It makes the chaotic system (1) track the given trajectory . And the control input makes a cost function optimal. Therefore, the tracking error system is first defined as

(3)
where the tracking error , , , and and are the control and disturbance of system (2). In this paper, we assume that and is lipschitz. In the tracking error system (3), is the disturbance, which can be seen as another input, and makes the cost function maximum. Then ZS differential game will be adopted to the optimal tracking control problem. The cost function is defined as
(4)
where , R, and S are positive definite matrices.

Putting Eqs. (2) and (3) together yields the augmented system

(5)
where , , , and . By using the augmented system (5), the cost function (4) becomes
(6)
where .

Then the two-player ZS differential game is

(7)
where is the value of the game. In this paper, we assume that the two-player optimal control problem has a unique solution, i.e., the Nash condition holds
(8)

By Leibniz’s formula and differentiating, the nonlinear ZS game Bellman equation, which is given in terms of the Hamiltonian function, is obtained as

(9)
where . The stationary conditions are
(10)
(11)
According to Eq. (9), we have the optimal control and the disturbance
(12)
(13)
Substitute Eqs. (12) and (13) into Bellman equation (9), we can derive V from the solution of the HJI equation
(14)

The HJI equation provides the solution to the optimal control problem for the ZS game. When it can be solved, it provides an optimal control in state-variable feedback (i.e., closed-loop) form. The Bellman equation is a partial differential equation for the value. That is, given any stabilizing feedback control policies and yielding finite values, the solution to the Bellman equation (9) is the value given by Eq. (6).

As the system dynamics is unknown, a data-based synchronous method will be established to solve the HJI equation (14).

3. Data-based synchronous method

In this section, the PI algorithm is first given. Then the off-policy learning is used to transform the PI algorithm to a synchronous method without system dynamics.

The PI algorithm starts from the initial admissible control pair . Then for iterative step , is obtained by

(15)
and the policy pair updates by
(16)
(17)

From Eqs. (15)–(17), we can see that the system dynamics is necessary for the PI algorithm. Therefore, the following synchronous method is given based on the PI algorithm and off-policy learning.

Let and be obtained by Eqs. (16) and (17), then the original system (5) is rewritten as

(18)
According to Eq. (18), we have
(19)
From Eqs. (16) and (17), we have
(20)
(21)
Then equation (19) is
(22)
Thus from Eq. (15), the off-policy Bellman equation for the ZS games is expressed as
(23)
In Eq. (23), the system dynamics is avoided. In the following section, the NN implementation procedure is presented with convergence analysis.

4. NN implementation

The NN implementation procedure is first given in this section. Then the convergence of the NN weight is analyzed.

The neural network expression of CNN is given as

(24)
where is the ideal weight of the critic network, is the active function, and is the residual error. Let the estimation of be . Then the estimation of is
(25)
(26)

The neural network expression of ANN is

(27)
where is the ideal weight of the action network, is the active function, and is the residual error. Let be the estimation of , then the estimation of is
(28)

The neural network expression of DNN is

(29)
where is the ideal weight of the action network, is the active function, and is the residual error. Let be the estimation of , then the estimation of is
(30)

Substituting Eqs. (25), (28), and (30) into Eq. (23), we can define the equation error as

(31)
Substituting Eqs. (25), (28), and (30) into Eq. (31), we have
(32)
where . By kronecker product, it has ,
(33)
(34)
Then we can define
(35)
(36)
(37)
(38)
Therefore, equation (32) becomes
(39)
where .

Define

(40)
According to the gradient descent method, we have the update law of as
(41)
where is a positive number.

Based on the NN implementation and off-policy learning, the convergence of the synchronous method is proposed.

5. Simulation study

In this paper, two simulation examples are given to demonstrate the effectiveness of the proposed method.

5.1. Example 1

Consider the chaotic systems described by the following differential equation:[41]

(57)
where , u is the control input, and d is the disturbance input. Let
(58)
(59)
Here, let β = 0, system (57) becomes the Lorenz system when perturbations are not present. The trajectory of system (57) is shown in Fig. 1.

Fig. 1. (color online) Lorenz system where β = 0.

In this paper, the desired trajectory is . We select hyperbolic tangent functions as the activation functions of critic, action, and disturbance networks. The structures of critic, action, and disturbance networks are 3−8−1, 3−8−3, and 3−8−3, respectively. The initial weight W is selected arbitrarily from (−1, 1). For the cost function, Q, R, and S in the utility function are identity matrices of appropriate dimensions. After 150 time steps, the simulation results are obtained. Figures 2 and 3 are the control and disturbance input trajectories. Based on the inputs, the tracking error is given in Fig. 4, which shows that the tracking error is convergent. The chaotic system state is demonstrated in Fig. 5. Is is clear that the chaotic system tracks the given trajectories.

Fig. 2. (color online) Control input trajectories.
Fig. 3. (color online) Disturbance input trajectories.
Fig. 4. (color online) Tracking error.
Fig. 5. (color online) System state.
5.2. Example 2

Consider the following chaotic system:[42,43]

(60)
where , g = diag(5,5,5), and h = diag(1,1,1). When a = 36, b = 3, and c = 20, the internal system (46) without perturbations is shown in Fig. 6.

Fig. 6. (color online) Lü chaotic attractor.

In this example, the desired trajectory is . The matrices Q, R, and S in the utility function are identity matrices of appropriate dimensions. For critic, action, and disturbance networks, the activation functions are hyperbolic tangent functions. The structures are 3−8−1, 3−8−3, and 3−8−3, respectively. The initial weight W is selected arbitrarily from (−1, 1). After 100 time steps, the simulation results are shown in Figs. 710. In Figs. 7 and 8, the control and disturbance inputs are given, which are convergent. Under the inputs action, the tracking error trajectories are displayed in Fig. 9, which converge to zero. At last, the closed-loop chaotic system state based on the inputs is demonstrated in Fig. 10. It can be seen that the proposed method is effective to make the chaotic system track the given trajectories.

Fig. 7. (color online) Control input trajectories.
Fig. 8. (color online) Disturbance input trajectories.
Fig. 9. (color online) Tracking error.
Fig. 10. (color online) State.
6. Conclusion

An optimal tracking control method for a chaotic system with unknown dynamics and disturbances is proposed in this paper. The tracking error dynamics and the generator dynamics make up the augmented system. A new cost function is introduced for the optimal tracking control problem. The PI is introduced to solve the min-max optimization problem. The off-policy learning method is applied to update the iterative cost function using only measured data and without any knowledge about the system dynamics. ANN, DNN, and CNN are used to approximate the cost function, control, and disturbance with convergence analyses. It is proven that the closed-loop tracking error system is convergent. Simulation results are given to show the effectiveness of the proposed synchronous solution method for chaotic system tracking problem. The future research is to use the proposed approach to solve the control problem for a class of systems with interconnection term, and to analyze the convergence of PI in ZS games.

Reference
[1] Shen Z Li J 2016 Mathematics and Computers in Simulation
[2] Wang G Cai B Jin P Hu T 2016 Chin. Phys. B 25 010503
[3] Wang W Zhang X Chang Y Wang X Wang Z Chen X Zheng L 2016 Chin. Phys. B 25 010202
[4] Zhang Z Y Feng X Q Yao Z H Jia H Y 2015 Chin. Phys. B 24 110503
[5] Wu J Wang L Chen G Duan S 2016 Chaos Soliton. Fract. 92 20
[6] J H Chen G R 2002 International Journal of Bifurcation and Chaos 12 659
[7] Ma T Zhang H Fu J 2008 Chin. Phys. B 17 4407
[8] Ma T Fu J 2011 Chin. Phys. B 20 050511
[9] Yang D 2014 Chin. Phys. B 23 010504
[10] Song R Xiao W Wei Q 2014 Chin. Phys. B 23 050504
[11] Wei Q Song R Xiao W Sun Q 2015 Chin. Phys. B 24 090504
[12] Wei Q Liu D Lin H 2016 IEEE Transactions on Cybernetics 46 840
[13] Zhang H Feng T Yang G H Liang H 2015 IEEE Transactions on Cybernetics 45 1315
[14] Wei Q Wang F Liu D Yang X 2014 IEEE Transactions on Cybernetics 44 2820
[15] Modares H Lewis F L 2014 IEEE Transactions on Automatic Control 59 3051
[16] Liu D Wei Q 2013 IEEE Transactions on Cybernetics 43 779
[17] Nguyen T L 2016 Neurocomputing
[18] Song R Xiao W Zhang H Sun C 2014 IEEE Transactions on Neural Networks and Learning Systems 25 1733
[19] Song R Lewis F L Wei Q Zhang H Jiang Z P 2015 IEEE Transactions on Neural Networks and Learning Systems 26 851
[20] Song R Lewis F L Wei Q Zhang H 2016 IEEE Transactions on Cybernetics 46 1041
[21] Wang W Li L Peng H Xiao J Yang Y 2014 Nonlinear Dynamics 76 591
[22] Wang W Li L Peng H Xiao J Yang Y 2014 Neural Networks 53 8
[23] Li L Li W Kurths J Luo Q Yang Y Li S 2015 Chaos Soliton. Fract. 72 20
[24] Song R Xiao W Sun C Wei Q 2013 Chin. Phys. B 22 090502
[25] Basar T Bernard P 1995 H∞ Optimal Control and Related Minimax Design Problems Boston Birkhauser
[26] Basar T Olsder G J 1999 Dynamic Noncooperative Game Theory 2 23
[27] Devasia S Chen D Paden B 1996 IEEE Transactions of Automatic Control 14 930
[28] Zhang H Wei Q Liu D 2011 Automatica 47 207
[29] Wei Q Song R Yan P 2016 IEEE Transactions on Neural Networks and Learning Systems 27 444
[30] Liu D Wei Q 2014 International Journal of Adaptive Control and Signal Processing 28 205
[31] Song R Lewis F L Wei Q 2016 IEEE Transactions on Neural Networks and Learning Systems
[32] Zhang F Liao X Zhang G 2016 Applied Mathematics and Computation 284 332
[33] Chen G Ueta T 1999 International Journal of Bifurcation and Chaos 9 1465
[34] Leonov G A Kuznetsov N V 2015 Applied Mathematics and Computation 256 334
[35] Lorenz E 1963 Journal of the Atmospheric Sciences 20 130
[36] Leonov G A Kuznetsov N V Korzhemanova N A Kusakin D V 2016 Communications in Nonlinear Science and Numerical Simulation 41 84
[37] Chua L Komuro M Matsumoto T 1986 IEEE Transactions on Circuits and Systems 33 1072
[38] Alkahtani B S T 2016 Chaos Soliton. Fract. 89 547
[39] Wiggins S 1987 Phys. Lett. A 124 138
[40] Simo H Woafo P 2016 Optik-International Journal for Light and Electron Optics 127 8760
[41] Vargas J A R Grzeidak E Gularte K H M 2016 Neurocomputing 174 1038
[42] J Chen G Zhang S 2002 International Journal of Bifurcation and Chaos 12 1001
[43] J Chen G Zhang S 2002 Chaos Soliton. Fract. 14 669